Riemannian metrics for neural networks
نویسنده
چکیده
We describe four algorithms for neural network training, each adapted to different scalability constraints. These algorithms are mathematically principled and invariant under a number of transformations in data and network representation, from which performance is thus independent. These algorithms are obtained from the setting of differential geometry, and are based on either the natural gradient using the Fisher information matrix, or on Hessian methods, scaled down in a specific way to allow for scalability while keeping some of their key mathematical properties. The most standard way to train neural networks, backpropagation, has several known shortcomings. Convergence can be quite slow. Backpropagation is sensitive to data representation: for instance, even such a simple operation as exchanging 0’s and 1’s on the input layer will affect performance (Figure 1), because this amounts to changing the parameters (weights and biases) in a non-trivial way, resulting in different gradient directions in parameter space, and better performance with 1’s than with 0’s. (In the related context of restriced Boltzmann machines, it has been found that the standard training technique by gradient ascent favors setting hidden units to 1, for very much the same reason [AAHO11, Section 5].) This specific phenomenon disappears if, instead of the logistic function, the hyperbolic tangent is used as the activation function. Scaling also has an effect on performance: for instance, a common recommendation [LBOM96] is to use 1.7159 tanh(2x/3) instead of just tanh(x) as the activation function. It would be interesting to have algorithms whose performance is insensitive to particular choices such as scaling factors in network construction, parameter encoding or data representation. Such invariance properties mean more robustness for an algorithm: good performance on a particular problem presumably indicates good performance over a whole class of problems equivalent to the first one by simple (e.g., affine) transformations. Ways exist to deal with these issues, such as Hessian methods or the natural gradient, which are invariant (and thus preserve performance) over a wide class of changes in the representation of the data and of the network. However, these are generally not scalable (the cost of maintaining the whole
منابع مشابه
On Special Generalized Douglas-Weyl Metrics
In this paper, we study a special class of generalized Douglas-Weyl metrics whose Douglas curvature is constant along any Finslerian geodesic. We prove that for every Landsberg metric in this class of Finsler metrics, ? = 0 if and only if H = 0. Then we show that every Finsler metric of non-zero isotropic flag curvature in this class of metrics is a Riemannian if and only if ? = 0.
متن کاملON THE LIFTS OF SEMI-RIEMANNIAN METRICS
In this paper, we extend Sasaki metric for tangent bundle of a Riemannian manifold and Sasaki-Mok metric for the frame bundle of a Riemannian manifold [I] to the case of a semi-Riemannian vector bundle over a semi- Riemannian manifold. In fact, if E is a semi-Riemannian vector bundle over a semi-Riemannian manifold M, then by using an arbitrary (linear) connection on E, we can make E, as a...
متن کاملImproved learning of Riemannian metrics for exploratory analysis
We have earlier introduced a principle for learning metrics, which shows how metric-based methods can be made to focus on discriminative properties of data. The main applications are in supervising unsupervised learning to model interesting variation in data, instead of modeling all variation as plain unsupervised learning does. The metrics are derived by approximations to an information-geomet...
متن کاملOn quasi-Einstein Finsler spaces
The notion of quasi-Einstein metric in physics is equivalent to the notion of Ricci soliton in Riemannian spaces. Quasi-Einstein metrics serve also as solution to the Ricci flow equation. Here, the Riemannian metric is replaced by a Hessian matrix derived from a Finsler structure and a quasi-Einstein Finsler metric is defined. In compact case, it is proved that the quasi-Einstein met...
متن کاملRiemannian metrics for neural networks II: recurrent networks and learning symbolic data sequences
Recurrent neural networks are powerful models for sequential data, able to represent complex dependencies in the sequence that simpler models such as hidden Markov models cannot handle. Yet they are notoriously hard to train. Here we introduce a training procedure using a gradient ascent in a Riemannian metric: this produces an algorithm independent from design choices such as the encoding of p...
متن کاملAn adaptive estimation method to predict thermal comfort indices man using car classification neural deep belief
Human thermal comfort and discomfort of many experimental and theoretical indices are calculated using the input data the indicator of climatic elements are such as wind speed, temperature, humidity, solar radiation, etc. The daily data of temperature، wind speed، relative humidity، and cloudiness between the years 1382-1392 were used. In the First step، Tmrt parameter was calculated in the Ray...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1303.0818 شماره
صفحات -
تاریخ انتشار 2013